In this report I perform text analysis of the publicly available review data posted on Booking for the major turistic italian cities: Milan, Rome, Florence Venice and Verona.
Below are some of the key findings.
The dataset comprises 12 columns, which are the following:
en_reviewResponse
There are 87633 english reviews in the dataset. The reviews range from 2016-12-09 to 2019-01-09.
As the graph shows, hotels during nsummer and spring months received a higher number of tourists compared to winter months.
The highest number of weekly reviews was received within the half of 2018. The hotels received almost 1250 reviews in that week.
The following graph shows the Average Score assigned by tourists of different nationalities to hotels. Furthemore, for each country, it is highlighted the number of tourists that released a review. The interactive dashboard also gives the opportunity to filter the graph by continents, showing only the countries that belong to Europe, North and South America, Asia, Africa, Oceania or Antarctica.
The importance of words can be illustrated in a wordcloud. The wordcloud clearly shows that “hotel”, “location”, “breakfast” and “staff” are the four most important words in Booking reviews in italian tourist cities.
Conversly to english reviews, the most important word in italian reviews is “breakfast”. For Italian speaking people, food topic appears to be particularly valuable in the context of hospitality.
We often want to understand the relationship between words in a review. What sequences of words are common across review text? Given a sequence of words, what word is most likely to follow? What words have the strongest relationship with each other?
The above graph visualizes the common bigrams in English reviews, showing those that occurred at least 500 times and where neither word was a stop-word.
The network graph shows strong connections between the top several words (“friendly”, “staff”, “excellent” and “location”, “train” and “station”).
One way to analyze the sentiment of a text is to consider the text as a combination of its individual words and the sentiment content of the whole text as the sum of the sentiment content of the individual words. Sentiment analysis can be done as an inner join. Three sentiment lexicons are available via the get_sentiments() function. Let’s look at the words with a joy score from the NRC lexicon.
What are the most common joy words?
The aim is to determine the attitude of a reviewer (i.e. hotel guest) with respect to his (or her) past experience or emotional reaction towards the hotel. The attitude may be a judgment or evaluation.